Kaggle NFL 2020 Competition¶
Columns¶
Each row in the file corresponds to a single player's involvement in a single play. The dataset was intentionally joined (i.e. denormalized) to make the API simple. All the columns are contained in one large dataframe which is grouped and provided by PlayId.
GameId- a unique game identifierPlayId- a unique play identifierTeam- home or awayX- player position along the long axis of the field. See figure below.Y- player position along the short axis of the field. See figure below.S- speed in yards/secondA- acceleration in yards/second^2Dis- distance traveled from prior time point, in yardsOrientation- orientation of player (deg)Dir- angle of player motion (deg)NflId- a unique identifier of the playerDisplayName- player's nameJerseyNumber- jersey numberSeason- year of the seasonYardLine- the yard line of the line of scrimmageQuarter- game quarter (1-5, 5 == overtime)GameClock- time on the game clockPossessionTeam- team with possessionDown- the down (1-4)Distance- yards needed for a first downFieldPosition- which side of the field the play is happening onHomeScoreBeforePlay- home team score before play startedVisitorScoreBeforePlay- visitor team score before play startedNflIdRusher- the NflId of the rushing playerOffenseFormation- offense formationOffensePersonnel- offensive team positional groupingDefendersInTheBox- number of defenders lined up near the line of scrimmage, spanning the width of the offensive lineDefensePersonnel- defensive team positional groupingPlayDirection- direction the play is headedTimeHandoff- UTC time of the handoffTimeSnap- UTC time of the snapYards- the yardage gained on the play (you are predicting this)PlayerHeight- player height (ft-in)PlayerWeight- player weight (lbs)PlayerBirthDate- birth date (mm/dd/yyyy)PlayerCollegeName- where the player attended collegePosition- the player's position (the specific role on the field that they typically play)HomeTeamAbbr- home team abbreviationVisitorTeamAbbr- visitor team abbreviationWeek- week into the seasonStadium- stadium where the game is being playedLocation- city where the game is being playedStadiumType- description of the stadium environmentTurf- description of the field surfaceGameWeather- description of the game weatherTemperature- temperature (deg F)Humidity- humidityWindSpeed- wind speed in miles/hourWindDirection- wind direction
Field¶

In [88]:
Copied!
import pandas as pd
import pandas as pd
In [89]:
Copied!
train_df = pd.read_csv('nfl/train.csv', dtype={'FieldPosition': str}, low_memory=False)
train_df = pd.read_csv('nfl/train.csv', dtype={'FieldPosition': str}, low_memory=False)
In [110]:
Copied!
sorted_data = train_df.sort_values(by=['TimeHandoff'], ascending=[True]);
sorted_data[['PlayId', 'TimeHandoff']].head(50)
sorted_data = train_df.sort_values(by=['TimeHandoff'], ascending=[True]);
sorted_data[['PlayId', 'TimeHandoff']].head(50)
Out[110]:
| PlayId | TimeHandoff | |
|---|---|---|
| 0 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 21 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 20 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 19 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 18 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 17 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 16 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 14 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 13 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 12 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 11 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 15 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 9 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 10 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 2 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 3 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 4 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 1 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 6 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 7 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 8 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 5 | 20170907000118 | 2017-09-08T00:44:06.000Z |
| 34 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 43 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 42 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 41 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 40 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 39 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 37 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 36 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 35 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 33 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 38 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 31 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 23 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 24 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 25 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 26 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 22 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 27 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 28 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 29 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 30 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 32 | 20170907000139 | 2017-09-08T00:44:27.000Z |
| 55 | 20170907000189 | 2017-09-08T00:45:17.000Z |
| 64 | 20170907000189 | 2017-09-08T00:45:17.000Z |
| 63 | 20170907000189 | 2017-09-08T00:45:17.000Z |
| 62 | 20170907000189 | 2017-09-08T00:45:17.000Z |
| 61 | 20170907000189 | 2017-09-08T00:45:17.000Z |
| 60 | 20170907000189 | 2017-09-08T00:45:17.000Z |
In [91]:
Copied!
import matplotlib.pyplot as plt
plt.figure(figsize=(10,7))
train_df['Yards'].hist(bins=50, edgecolor='k', alpha=0.7)
plt.title('Distribution of Yards Gained')
plt.xlabel('Yards Gained')
plt.ylabel('Number of Plays')
plt.grid(False)
plt.show()
import matplotlib.pyplot as plt
plt.figure(figsize=(10,7))
train_df['Yards'].hist(bins=50, edgecolor='k', alpha=0.7)
plt.title('Distribution of Yards Gained')
plt.xlabel('Yards Gained')
plt.ylabel('Number of Plays')
plt.grid(False)
plt.show()
Exploratory Analysis¶
In [118]:
Copied!
# @title Filter By PlayId
GameId = 2017090700 # @param {type:"integer"}
# let's start to see how much data we have for a Play
game_data = train_df[train_df['GameId'] == GameId]
game_data['PlayId'].unique().shape
# @title Filter By PlayId
GameId = 2017090700 # @param {type:"integer"}
# let's start to see how much data we have for a Play
game_data = train_df[train_df['GameId'] == GameId]
game_data['PlayId'].unique().shape
Out[118]:
(52,)
In [122]:
Copied!
from skimage import io
image = io.imread("https://mrinaldi2.github.io/mkdocs-material-nfl/assets/images/AmFBfield.png")
from skimage import io
image = io.imread("https://mrinaldi2.github.io/mkdocs-material-nfl/assets/images/AmFBfield.png")
Draw Field Positions¶
The image used it was modified starting by the image here
Attribute to the image author
By The original uploader was Xyzzy n at English Wikipedia. - Transferred from en.wikipedia to Commons., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2257082
In [145]:
Copied!
# List of unique PlayIds for the game
play_ids = game_data['PlayId'].unique()
print(play_ids)
# Plot the first few plays as an example
num_rows = (play_ids.shape[0] // 3) + 1 # Number of plays to plot
num_cols = 3
fig, axes = plt.subplots(num_rows, num_cols, figsize=(30, 126))
for idx, play_id in enumerate(play_ids):
row = idx // 3
col = idx % 3
play_data = game_data[game_data['PlayId'] == play_id]
yard_line = play_data['YardLine'].iloc[0]
yards = play_data['Yards'].iloc[0]
field_position = play_data['FieldPosition'].iloc[0]
possesion_team = play_data['PossessionTeam'].iloc[0]
home_team_abbr = play_data['HomeTeamAbbr'].iloc[0]
if field_position == home_team_abbr:
absolute_yard_line = 10 + 50 + (50-yard_line)
else:
absolute_yard_line = 10 + yard_line
if (possesion_team == home_team_abbr):
yards = absolute_yard_line - yards
else:
yards = absolute_yard_line + yards
axes[row, col].imshow(image, extent=[-1, 121, -1, 54])
axes[row, col].scatter(play_data[play_data['Team'] == 'away']['X'], play_data[play_data['Team'] == 'away']['Y'], c='red', s=100)
axes[row, col].scatter(play_data[play_data['Team'] == 'home']['X'], play_data[play_data['Team'] == 'home']['Y'], c='blue', s=100)
axes[row, col].scatter(play_data[play_data['NflId'] == play_data['NflIdRusher']]['X'], play_data[play_data['NflId'] == play_data['NflIdRusher']]['Y'], color='purple', s=100)
axes[row, col].axvline(absolute_yard_line, color='yellow', linestyle='--', linewidth=2, label='Yard Line')
axes[row, col].hlines(xmin=absolute_yard_line, xmax=yards, y=25, color='yellow', linestyle='--', linewidth=2, label='Yard Line')
axes[row, col].set_title(f"Play {play_id}, Rusher: {play_data[play_data['NflId'] == play_data['NflIdRusher']]['Position'].iloc[0]}, Yards: {play_data['Yards'].iloc[0]}")
axes[row, col].set_xlim(0, 120)
axes[row, col].set_ylim(0, 53)
axes[row, col].grid(False)
plt.tight_layout()
plt.show()
# List of unique PlayIds for the game
play_ids = game_data['PlayId'].unique()
print(play_ids)
# Plot the first few plays as an example
num_rows = (play_ids.shape[0] // 3) + 1 # Number of plays to plot
num_cols = 3
fig, axes = plt.subplots(num_rows, num_cols, figsize=(30, 126))
for idx, play_id in enumerate(play_ids):
row = idx // 3
col = idx % 3
play_data = game_data[game_data['PlayId'] == play_id]
yard_line = play_data['YardLine'].iloc[0]
yards = play_data['Yards'].iloc[0]
field_position = play_data['FieldPosition'].iloc[0]
possesion_team = play_data['PossessionTeam'].iloc[0]
home_team_abbr = play_data['HomeTeamAbbr'].iloc[0]
if field_position == home_team_abbr:
absolute_yard_line = 10 + 50 + (50-yard_line)
else:
absolute_yard_line = 10 + yard_line
if (possesion_team == home_team_abbr):
yards = absolute_yard_line - yards
else:
yards = absolute_yard_line + yards
axes[row, col].imshow(image, extent=[-1, 121, -1, 54])
axes[row, col].scatter(play_data[play_data['Team'] == 'away']['X'], play_data[play_data['Team'] == 'away']['Y'], c='red', s=100)
axes[row, col].scatter(play_data[play_data['Team'] == 'home']['X'], play_data[play_data['Team'] == 'home']['Y'], c='blue', s=100)
axes[row, col].scatter(play_data[play_data['NflId'] == play_data['NflIdRusher']]['X'], play_data[play_data['NflId'] == play_data['NflIdRusher']]['Y'], color='purple', s=100)
axes[row, col].axvline(absolute_yard_line, color='yellow', linestyle='--', linewidth=2, label='Yard Line')
axes[row, col].hlines(xmin=absolute_yard_line, xmax=yards, y=25, color='yellow', linestyle='--', linewidth=2, label='Yard Line')
axes[row, col].set_title(f"Play {play_id}, Rusher: {play_data[play_data['NflId'] == play_data['NflIdRusher']]['Position'].iloc[0]}, Yards: {play_data['Yards'].iloc[0]}")
axes[row, col].set_xlim(0, 120)
axes[row, col].set_ylim(0, 53)
axes[row, col].grid(False)
plt.tight_layout()
plt.show()
[20170907000118 20170907000139 20170907000189 20170907000345 20170907000395 20170907000473 20170907000516 20170907000653 20170907000680 20170907000801 20170907000917 20170907001004 20170907001077 20170907001156 20170907001177 20170907001296 20170907001355 20170907001376 20170907001443 20170907001488 20170907001509 20170907001530 20170907001551 20170907001605 20170907001664 20170907001715 20170907001736 20170907001819 20170907001955 20170907002430 20170907002648 20170907002669 20170907002829 20170907002900 20170907002961 20170907003138 20170907003161 20170907003261 20170907003444 20170907003465 20170907003507 20170907003635 20170907003874 20170907004025 20170907004046 20170907004182 20170907004314 20170907004465 20170907004486 20170907004622 20170907004660 20170907004721]
In [ ]:
Copied!